A Discriminative Classifier Learning Approach to Image Modeling and Spam Image Identification

نویسندگان

  • Byungki Byun
  • Chin-Hui Lee
  • Steve Webb
  • Calton Pu
چکیده

We propose a discriminative classifier learning approach to image modeling for spam image identification. We analyze a large number of images extracted from the SpamArchive spam corpora and identify four key spam image properties: color moment, color heterogeneity, conspicuousness, and self-similarity. These properties emerge from a large variety of spam images and are more robust than simply using visual content to model images. We apply multi-class characterization to model images sent with emails. A maximal figure-of-merit (MFoM) learning algorithm is then proposed to design classifiers for spam image identification. Experimental results on about 240 spam and legitimate images show that multi-class characterization is more suitable than singleclass characterization for spam image identification. Our proposed framework classifies 81.5% of spam images correctly and misclassifies only 5.6% of legitimate images. We also demonstrate the generalization capabilites of our proposed framework on the TREC 2005 email corpus. Multi-class characterization again outperforms single-class characterization for the TREC 2005 email corpus. Our results show that the technique operates robustly, even when the images in the testing set are very different from the training images.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Tumor Segmentation Based on Hidden Markov Classifier using Singular Value Decomposition Feature Extraction in Brain MR images

ntroduction: Diagnosing brain tumor is not always easy for doctors, and existence of an assistant that                                                      facilitates the interpretation process is an asset in the clinic. Computer vision techniques are devised to aid the clinic in detecting tumors based on a database of tumor c...

متن کامل

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

Active Learning Image Spam Hunter

Image spam is annoying email users around the world. Most previous work for image spam detection focuses on supervised learning approaches. However, it is costly to get enough trustworthy labels for learning, especially for an adversarial problem where spammers constantly modify patterns to evade the classifier. To address this issue, we employ the principle of active learning where the learner...

متن کامل

Detecting Image Spam Using Image Texture Features

Filtering image email spam is considered to be a challenging problem because spammers keep modifying the images being used in their campaigns by employing different obfuscation techniques. Therefore, preventing text recognition using Optical Character Recognition (OCR) tools and imposing additional challenges in filtering such type of spam. In this paper, we propose an image spam filtering tech...

متن کامل

A Robust Strucutural Fingerprint Restoration

Fast and accurate ridge detection in fingerprints is essential to each AFIS (Automatic Fingerprint Identification System). Smudged furrows and cut ridges in the image of a finger print are major problems in any AFIS. This paper investigates a new online ridge detection method that reduces the complexity and costs associated with the fingerprint identification procedure. The noise in fingerprint...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007